北京邮电大学学报

  • EI核心期刊

北京邮电大学学报 ›› 2013, Vol. 36 ›› Issue (2): 20-23.doi: 10.13190/jbupt.201302.20.huangshl

• 论文 • 上一篇    下一篇

针对产品命名实体识别的半监督学习方法

黄诗琳,郑小林,陈德人   

  1. 浙江大学 计算机科学与技术学院, 杭州 310027
  • 收稿日期:2012-09-29 修回日期:2012-10-26 出版日期:2013-04-30 发布日期:2013-03-25
  • 通讯作者: 郑小林 E-mail:xlzheng@zju.edu.cn
  • 基金资助:

    国家科技支撑计划项目(2012BAH16F02);国家自然科学基金项目(61003254)

A Semi-Supervised Learning Method for Product Named Entity Recognition

HUANG Shi-lin, ZHENG Xiao-lin, CHEN De-ren   

  1. College of Computer Science and Technology, Zhejiang University, Hangzhou 310027, China
  • Received:2012-09-29 Revised:2012-10-26 Online:2013-04-30 Published:2013-03-25

摘要:

针对商务信息领域的产品命名实体,研究了产品命名实体各部分的结构特征和相互关系,建立了一个三层的半监督学习框架. 该方法综合利用规则词典和统计的方法,建立一个隐条件随机场模型,可以更充分地利用自举得到数据的隐藏状态. 在数码相机领域进行的实验结果表明,该方法只需要少量的手工标记数据就能较好地识别网页等文本中的产品命名实体.

关键词: 产品命名实体识别, 商务信息处理, 自然语言处理

Abstract:

A semi-supervised approach based on a three-level framework for product named entity recognition is presented. The structure features and relationships among different parts of product named entities are studied, and a combined method is applied. A hidden conditional random field model is built so as to utilize the hidden status of learned samples. The labels failed to be learned by the bootstrapping algorithm is considered as hidden statuses. Experiment in digital camera area shows that, with only a few manually labeled data, this method could recognize product named entities from text contents of web pages very well.

Key words: product named entity recognition, business information processing, natural language processing

中图分类号: